Jeremy Ames; Mustapha Farram; Rabea Radman
Autoregressive Integrated Moving Averages The general process for ARIMA models is the following:
Get the Data Clean the Data Visualize the Time Series Data Make the time series data stationary Plot the Correlation and AutoCorrelation Charts Construct the ARIMA Model Use the model to make predictions
'''Statsmodels is a Python module that allows users to explore data,estimate statistical models, and perform statistical test. It provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct. The package is released under the open source Modified BSD (3-clause) license. The online documentation is hosted at statsmodels.org'''
# Autoregressive Integrated Moving Average (ARIMA) model
'''It is a generalization of a an autoregressive moving (ARMA) model. Both of those models (ARIMA and ARMA)are fitted to time series data either to better understand the data or to predict future points in the series (forecasting).
ARIMA model have two types:
1- Non-seasonal ARIMA for non-seasonal Data; 2- Seasonal ARIMA for SEasonal Data
ARIMA models are applied in some cases where data show evidence of non-stationaryity'''
# Major Components of ARIMA model in general
1- Non-seasonal ARIMA models are generally denoteed ARIMA (pdq) where parameteter p,q and q are non-negative integers;
a) AR(p): Autoregression: A basic regression model that utilizes the dependent relationship between a current observation and observations over a previous period;
b) I(d): Integrated: Differencing observations (subtracting and observation at the previous time step) in order to make the time series stationary;
c) MA(q): Moving Average: A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.
# Stationary vs Non-Stationary Data
To effectivelly use ARIMA, we need to understand Stationarity in our data. So, what makes a data set Stationary? A Stationary series has constant mean and variance over time.
A stationary data set will allow our model to predict that the mean and variance will be the same in future periods.
# What a stationary and non-stationary Data look like visually?
Aspect 1_Mean?
1- Stationary, constant mean;
2- Non-stationary, non-constant mean.
Aspect_2: Variance?
1- Stationary Data, variance is not changing over time in function of time
2- Non-stationary Data, variance is changing over time. Variance is a function of time.
Aspect 3_Covariance?
1- Stationary Data, covariance is not changing over time in function of time
2- Non-stationary Data, covariance is not changing over time.
# What a stationary and non-stationary Data look like mathematically?
There are mathematical statsmodel to test stationarity in Data as Augmented Dickey-Fuller test with Pyhton's statsmodels.
# Conclusion
# General process for ARIMA models
+ Get the Time Series Data
+ Clean the Time Series Data
+ Visulize the Time Series Data
+ Make the time data stationary
+ Plot the Corelation and Autocorrelation Charts
+ Construct the ARIMA Model
+ Use the model to make predictions
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt
%matplotlib inline
# Register converters to avoid warnings
pd.plotting.register_matplotlib_converters()
plt.rc("figure", figsize=(16,8))
plt.rc("font", size=14)
# Get and check the Data
df=pd.read_csv('stock_px_2.csv')
# check the head of DataFrame df
df.head()
# check the index of the columns
df.columns
# B_1: Get rid of the coulmn name 'unnamed:0'
df1 = pd.read_csv('stock_px_2.csv')
df1.rename(columns=({'Unnamed: 0':'Dates'}))
# B_2: Get rid of the coulmn name 'unnamed:0'
df2=df1.rename(columns=({'Unnamed: 0':'Dates'}))
# check the head of DataFrame df2
df2.head()
# check the tail of DataFrame df2
df2.tail()
# B_3: Convert Dates column to a datetime column
df2['Dates']= pd.to_datetime(df2['Dates'])
df2.head()
df2.head()
# check the format of the index
df2.index
# Describe the parameters of df2
df2.describe().transpose()
# Transform DataFrames to Time series
s1_AAPL = df['AAPL']
s2_MSFT = df['MSFT']
s3_XOM = df['XOM']
s4_SPX = df['SPX']
# check of the type of time_series
type (s1_AAPL)
type(s2_MSFT)
type(s3_XOM)
type(s4_SPX)
# Display the head of the series
s1_AAPL.head(),s2_MSFT.head(),s3_XOM.head(),s4_SPX.head()
s1_AAPL.rolling(12).mean().plot(label='12 Month Rolling Mean')
s1_AAPL.rolling(12).std().plot(label='12 Month Rolling std')
s1_AAPL.plot()
plt.legend
#plot_size_inches(20,50)
s2_MSFT.rolling(12).mean().plot(label='12 Month Rolling Mean')
s2_MSFT.rolling(12).std().plot(label='12 Month Rolling std')
s2_MSFT.plot(), plt.legend
s3_XOM.rolling(12).mean().plot(label='12 Month Rolling Mean')
s3_XOM.rolling(12).std().plot(label='12 Month Rolling std')
s3_XOM.plot(), plt.legend
s4_SPX.rolling(12).mean().plot(label='12 Month Rolling Mean')
s4_SPX.rolling(12).std().plot(label='12 Month Rolling std')
s4_SPX.plot()
plt.legend
s1_AAPL.index, s2_MSFT.index, s3_XOM.index, s4_SPX.index
# Error trend and decomposition
from statsmodels.tsa.seasonal import seasonal_decompose
decomp_1 = seasonal_decompose(s1_AAPL,freq=12)
Fig_1 = decomp_1.plot()
Fig_1.set_size_inches(15,18)
decomp_2 = seasonal_decompose(s2_MSFT,freq=12)
Fig_2 = decomp_2.plot()
Fig_2.set_size_inches(15,18)
decomp_3 = seasonal_decompose(s3_XOM,freq=12)
Fig_3 = decomp_3.plot()
Fig_3.set_size_inches(15,18)
decomp_4 = seasonal_decompose(s4_SPX,freq=13)
Fig_4 = decomp_4.plot()
Fig_4.set_size_inches(15,18)
Augmented Dickey-Fuller test
# check the DataFrame
df2.head()
- Null hypothesis: Non stationary time series;
- Alternative hypothesis: the series has no unit root and actually is stationary
We're going to decide on this based off a p value:
a) returned a small p value typically less than 0.5 indicates strong evidence against the null hypothesis,
mean rejection of null hypothesis (Non stationary time series)
b) returned a large p value, greater than 0.5 that indicates weak evidence against the null hypothesis,
mean we fail to reject Null hypothesis'''
# Runnning Augmented Dickey-Fuller model to test data
from statsmodels.tsa.stattools import adfuller
result_1_AAPL = adfuller(df2['AAPL'])
result_1_AAPL
def adf_check(s1_AAPL):
result_1_AAPL = adfuller(s1_AAPL)
print("Augmented Dicky-Fuller Test")
labels = ['ADF Test Statistic','p-value','# of lags', 'Num of observations used', 'AIC','X?']
for value,label in zip(result_1_AAPL,labels):
print(label+" : "+str(value))
if result_1_AAPL [1] <= 0.05:
print("Strong evidence against null hypothesis")
print('reject null hypothesis')
print("AAPL Data has no unit root and is stationary")
else:
print('weak evidence against null hypothesis')
print ('fail to reject null hypothesis')
print ('AAPL Data has a unit root, it is non-stationary')
result_2_MSFT = adfuller(df2['MSFT'])
result_2_MSFT
def adf_check(s2_MSFT):
result_2_MSFT = adfuller(s2_MSFT)
print("Augmented Dicky-Fuller Test")
labels = ['ADF Test Statistic','p-value','# of lags', 'Num of observations used', 'AIC','X?']
for value,label in zip(result_2_MSFT ,labels):
print(label+" : "+str(value))
if result_2_MSFT [1] <= 0.05:
print("Strong evidence against null hypothesis")
print('reject null hypothesis')
print("MSFT Data has no unit root and is stationary")
else:
print('weak evidence against null hypothesis')
print ('fail to reject null hypothesis')
print ('MSFT Data has a unit root, it is non-stationary')
result_s3_XOM = adfuller(df2['XOM'])
result_s3_XOM
def adf_check(s3_XOM):
result_s3_XOM = adfuller(s3_XOM)
print("Augmented Dicky-Fuller Test")
labels = ['ADF Test Statistic','p-value','# of lags', 'Num of observations used', 'AIC','X?']
for value,label in zip(result_s3_XOM ,labels):
print(label+" : "+str(value))
if result_s3_XOM [1] <= 0.05:
print("Strong evidence against null hypothesis")
print('reject null hypothesis')
print("XOM Data has no unit root and is stationary")
else:
print('weak evidence against null hypothesis')
print ('fail to reject null hypothesis')
print ('XOM Data has a unit root, it is non-stationary')
result_s4_SPX = adfuller(df2['SPX'])
result_s4_SPX
def adf_check(s4_SPX):
result_s4_SPX = adfuller(s4_SPX)
print("Augmented Dicky-Fuller Test")
labels = ['ADF Test Statistic','p-value','# of lags', 'Num of observations used', 'AIC','X?']
for value,label in zip(result_s4_SPX ,labels):
print(label+" : "+str(value))
if result_s4_SPX [1] <= 0.05:
print("Strong evidence against null hypothesis")
print('reject null hypothesis')
print("SPX Data has no unit root and is stationary")
else:
print('weak evidence against null hypothesis')
print ('fail to reject null hypothesis')
print ('SPX Data has a unit root, it is non-stationary')
Note We have now realized that our data is seasonal (it is also pretty obvious from the plot itself). This means we need to use Seasonal ARIMA on our model. If our data was not seasonal, it means we could use just ARIMA on it. We will take this into account when differencing our data! Typically financial stock data won't be seasonal, but that is kind of the point of this section, to show you common methods, that won't work well on stock finance data!
where parameteter p,q and q are non-negative integers;
a) AR(p): Autoregression: A basic regression model that utilizes the dependent relationship between a current observation and observations over a previous period;
b) I(d): Integrated: Differencing observations (subtracting and observation at the previous time step) in order to make the time series stationary;
c) MA(q): Moving Average: A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.
s1_AAPL; s3_XOM; s4_SPX
def adf_check(s1_AAPL):
result_1_AAPL = adfuller(s1_AAPL)
print("Augmented Dicky-Fuller Test")
labels = ['ADF Test Statistic','p-value','# of lags', 'Num of observations used', 'AIC','X?']
for value,label in zip(result_1_AAPL,labels):
print(label+" : "+str(value))
if result_1_AAPL [1] <= 0.05:
print("Strong evidence against null hypothesis")
print('reject null hypothesis')
print("AAPL Data has no unit root and is stationary")
else:
print('weak evidence against null hypothesis')
print ('fail to reject null hypothesis')
print ('AAPL Data has a unit root, it is non-stationary')
df2
# First difference
# df['Milk First Difference'] = df['Milk in pounds per cow'] - df['Milk in pounds per cow'].shift(1)
df2['AAPL First Difference']= (df2['AAPL']-df2['AAPL'].shift(1)).dropna()
df2
#df2[['AAPL','forecast']].plot(figsize=(12,8))
df2['AAPL First Difference'].plot()
# Store in a function for later use!
def adf_check(time_series):
"""
Pass in a time series, returns ADF report
"""
result = adfuller(time_series)
print('Augmented Dickey-Fuller Test:')
labels = ['ADF Test Statistic','p-value','#Lags Used','Number of Observations Used']
for value,label in zip(result,labels):
print(label+' : '+str(value) )
if result[1] <= 0.05:
print("strong evidence against the null hypothesis, reject the null hypothesis. Data has no unit root and is stationary")
else:
print("weak evidence against null hypothesis, time series has a unit root, indicating it is non-stationary ")
adf_check(df2['AAPL First Difference'].dropna())
df2['AAPL First Difference'].plot()
# second difference
# Sometimes it would be necessary to do a second difference
# This is just for practice, we didn't need to do a second difference for AAPL
df2['AAPL Second Difference'] = df2['AAPL First Difference'] - df2['AAPL First Difference'].shift(1)
adf_check(df2['AAPL Second Difference'].dropna())
df2['AAPL Second Difference'].plot()
df2['Seasonal Difference'] = df2['AAPL']-df2['AAPL'].shift(12)
df2['Seasonal Difference'].plot()
# Visual Seasonal Difference by itself was not enough!
adf_check(df2['Seasonal Difference'].dropna())
This is just for practice, we didn't need to do a second difference for AAPL: Seasonal Difference show already the stationarity of data.
# You can also do seasonal first difference
df2['Seasonal First Difference'] = df2['Seasonal Difference'] - df2['Seasonal Difference'].shift(12)
df2['Seasonal First Difference'].plot()
adf_check(df2['Seasonal First Difference'].dropna())
An autocorrelation plot (also known as a Correlogram ) shows the correlation of the series with itself, lagged by x time units. So the y axis is the correlation and the x axis is the number of time units of lag.
So imagine taking your time series of length T, copying it, and deleting the first observation of copy #1 and the last observation of copy #2. Now you have two series of length Tβ1 for which you calculate a correlation coefficient. This is the value of of the vertical axis at x=1x=1 in your plots. It represents the correlation of the series lagged by one time unit. You go on and do this for all possible time lags x and this defines the plot.
You will run these plots on your differenced/stationary data. There is a lot of great information for identifying and interpreting ACF and PACF here and here.
Autocorrelation Interpretation The actual interpretation and how it relates to ARIMA models can get a bit complicated, but there are some basic common methods we can use for the ARIMA model. Our main priority here is to try to figure out whether we will use the AR or MA components for the ARIMA model (or both!) as well as how many lags we should use. In general you would use either AR or MA, using both is less common.
If the autocorrelation plot shows positive autocorrelation at the first lag (lag-1), then it suggests to use the AR terms in relation to the lag
If the autocorrelation plot shows negative autocorrelation at the first lag, then it suggests using MA terms.
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf
Here we will be showing running the ACF and PACF on multiple differenced data sets that have been made stationary in different ways, typically you would just choose a single stationary data set and continue all the way through with that.
The reason we use two here is to show you the two typical types of behaviour you would see when using ACF.
# Duplicate plots
# Check out: https://stackoverflow.com/questions/21788593/statsmodels-duplicate-charts
# https://github.com/statsmodels/statsmodels/issues/1265
fig_first = plot_acf(df2['AAPL First Difference'].dropna())
fig_seasonal_first = plot_acf(df2["Seasonal First Difference"].dropna())
Pandas also has this functionality built in, but only for ACF, not PACF. So I recommend using statsmodels, as ACF and PACF is more core to its functionality than it is to pandas' functionality.
from pandas.plotting import autocorrelation_plot
autocorrelation_plot(df2['Seasonal First Difference'].dropna())
In general, a partial correlation is a conditional correlation.
It is the correlation between two variables under the assumption that we know and take into account the values of some other set of variables.
For instance, consider a regression context in which y = response variable and x1, x2, and x3 are predictor variables. The partial correlation between y and x3 is the correlation between the variables determined taking into account how both y and x3 are related to x1 and x2.
Formally, this is relationship is defined as:
Covariance(π¦,π₯3|π₯1,π₯2)Variance(π¦|π₯1,π₯2)Variance(π₯3|π₯1,π₯2)β
https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4463.htm
# We can then plot this relationship/ Partial correlation: pacf
result = plot_pacf(df2["Seasonal First Difference"].dropna())
Typically a sharp drop after lag "k" suggests an AR-k model should be used. If there is a gradual decline, it suggests an MA model.
Identification of an AR model is often best done with the PACF. For an AR model, the theoretical PACF βshuts offβ past the order of the model. The phrase βshuts offβ means that in theory the partial autocorrelations are equal to 0 beyond that point. Put another way, the number of non-zero partial autocorrelations gives the order of the AR model. By the βorder of the modelβ we mean the most extreme lag of x that is used as a predictor. Identification of an MA model is often best done with the ACF rather than the PACF. For an MA model, the theoretical PACF does not shut off, but instead tapers toward 0 in some manner. A clearer pattern for an MA model is in the ACF. The ACF will have non-zero autocorrelations only at lags involved in the model.
# Register converters to avoid warnings
pd.plotting.register_matplotlib_converters()
plt.rc("figure", figsize=(16,8))
plt.rc("font", size=14)
We've run quite a few plots, so let's just quickly get our "final" ACF and PACF plots. These are the ones we will be referencing in the rest of the notebook below.
plot_acf(df2['Seasonal First Difference'].dropna())
plot_pacf(df2['Seasonal First Difference'].dropna())
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(df2['Seasonal First Difference'].iloc[13:],lags=40,ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(df2['Seasonal First Difference'].iloc[13:],lags=40,ax=ax2)
Finally we can use our ARIMA model now that we have an understanding of our data!
# For non-seasonal data
from statsmodels.tsa.arima_model import ARIMA
help(ARIMA)
p: The number of lag observations included in the model. d: The number of times that the raw observations are differenced, also called the degree of differencing. q: The size of the moving average window, also called the order of moving average
df2['AAPL']
# df2.index = pd.DatetimeIndex(df2.index).to_period('M')
# We have seasonal data!
model = sm.tsa.statespace.SARIMAX(df2['AAPL'],order=(0,1,0), seasonal_order=(1,1,1,12))
results = model.fit()
print(results.summary())
results.resid
results.resid.plot()
# DEnsity Distribution
results.resid.plot(kind='kde')
Firts we can get an idea of how well our model performs by just predicting for values that we actually already know:
#Set of columns
df2.columns
# Examine the shape of the DataFrame (again)
print(df2.shape)
df2['forecast'] = results.predict(start =2000,end=9999,dynamic=False)
df2.tail()
df2.columns
df2[['AAPL','forecast']].plot(figsize=(12,8))
This requires more time periods, so let's create them with pandas onto our original dataframe!
df2.head()
# https://pandas.pydata.org/pandas-docs/stable/timeseries.html
# Alternatives
# pd.date_range(df.index[-1],periods=12,freq='M')
df2.tail()
from pandas.tseries.offsets import DateOffset
future_dates = pd.date_range(df2.index[-1],periods=12,freq='M')
#future_dates = [df2.index[-1] + DateOffset(months=x) for x in range(0,24) ]
future_dates
final_df2= pd.concat([df2.index,future_dates])
final_df2 = pd.DataFrame(index=future_dates,columns=df2.columns)
final_df2.head()
final_df2.tail()
print(future_df2.shape)
final_df2['forecast']= results.predict(start =2000, end = 9999, dynamic= True)
final_df2[['AAPL', 'forecast']].plot(figsize=(12, 8))
#install cufflinks via anaconda prompt using these commands - pip install plotly & pip install cufflinks
#install library packages
import plotly
import cufflinks as cf
import pandas as pd
import numpy as np
#Enabling the offline mode for interactive plotting locally
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
init_notebook_mode(connected=True)
cf.go_offline()
#To display the plots
%matplotlib inline
# Get, check and clean the Data for analysis
df_G7=pd.read_csv('stock_px_2.csv')
# check the head of DataFrame df
df_G7.head()
# B_1: Get rid of the coulmn name 'unnamed:0'
df_G7 = pd.read_csv('stock_px_2.csv')
df_G7.rename(columns=({'Unnamed: 0':'Dates'}))
# B_2: Get rid of the coulmn name 'unnamed:0'
df_G8=df_G7.rename(columns=({'Unnamed: 0':'Dates'}))
# check the head of DataFrame df
df_G8.head()
#Download dataset for analysis
Stock_1 = df_G8
#Call dataset
Stock_1
# check the index of the columns
Stock_1.columns
#Selecting columns to display in a "simple plot"
Stock_1[['AAPL', 'MSFT', 'XOM', 'SPX']].iplot()
Help on class ARIMA in module statsmodels.tsa.arima_model:
class ARIMA(ARMA)
| Autoregressive Integrated Moving Average ARIMA(p,d,q) Model
|
| Parameters
| ----------
| endog : array-like
| The endogenous variable.
| order : iterable
| The (p,d,q) order of the model for the number of AR parameters,
| differences, and MA parameters to use.
| exog : array-like, optional
| An optional array of exogenous variables. This should not include a
| constant or trend. You can specify this in the fit method.
| dates : array-like of datetime, optional
| An array-like object of datetime objects. If a pandas object is given
| for endog or exog, it is assumed to have a DateIndex.
| freq : str, optional
| The frequency of the time-series. A Pandas offset or 'B', 'D', 'W',
| 'M', 'A', or 'Q'. This is optional if dates are given.
|
|
| Notes
| -----
| If exogenous variables are given, then the model that is fit is
|
| .. math::
|
| \phi(L)(y_t - X_t\beta) = \theta(L)\epsilon_t
|
| where :math:\phi and :math:\theta are polynomials in the lag
| operator, :math:L. This is the regression model with ARMA errors,
| or ARMAX model. This specification is used, whether or not the model
| is fit using conditional sum of square or maximum-likelihood, using
| the method argument in
| :meth:statsmodels.tsa.arima_model.ARIMA.fit. Therefore, for
| now, css and mle refer to estimation methods only. This may
| change for the case of the css model in future versions.
|
| Method resolution order:
| ARIMA
| ARMA
| statsmodels.tsa.base.tsa_model.TimeSeriesModel
| statsmodels.base.model.LikelihoodModel
| statsmodels.base.model.Model
| builtins.object
|
| Methods defined here:
|
| getnewargs(self)
|
| init(self, endog, order, exog=None, dates=None, freq=None, missing='none')
| Initialize self. See help(type(self)) for accurate signature.
|
| fit(self, start_params=None, trend='c', method='css-mle', transparams=True, solver='lbfgs', maxiter=50, full_output=1, disp=5, callback=None, start_ar_lags=None, kwargs)
| Fits ARIMA(p,d,q) model by exact maximum likelihood via Kalman filter.
|
| Parameters
| ----------
| start_params : array-like, optional
| Starting parameters for ARMA(p,q). If None, the default is given
| by ARMA._fit_start_params. See there for more information.
| transparams : bool, optional
| Whehter or not to transform the parameters to ensure stationarity.
| Uses the transformation suggested in Jones (1980). If False,
| no checking for stationarity or invertibility is done.
| method : str {'css-mle','mle','css'}
| This is the loglikelihood to maximize. If "css-mle", the
| conditional sum of squares likelihood is maximized and its values
| are used as starting values for the computation of the exact
| likelihood via the Kalman filter. If "mle", the exact likelihood
| is maximized via the Kalman Filter. If "css" the conditional sum
| of squares likelihood is maximized. All three methods use
| start_params as starting parameters. See above for more
| information.
| trend : str {'c','nc'}
| Whether to include a constant or not. 'c' includes constant,
| 'nc' no constant.
| solver : str or None, optional
| Solver to be used. The default is 'lbfgs' (limited memory
| Broyden-Fletcher-Goldfarb-Shanno). Other choices are 'bfgs',
| 'newton' (Newton-Raphson), 'nm' (Nelder-Mead), 'cg' -
| (conjugate gradient), 'ncg' (non-conjugate gradient), and
| 'powell'. By default, the limited memory BFGS uses m=12 to
| approximate the Hessian, projected gradient tolerance of 1e-8 and
| factr = 1e2. You can change these by using kwargs.
| maxiter : int, optional
| The maximum number of function evaluations. Default is 50.
| tol : float
| The convergence tolerance. Default is 1e-08.
| full_output : bool, optional
| If True, all output from solver will be available in
| the Results object's mle_retvals attribute. Output is dependent
| on the solver. See Notes for more information.
| disp : int, optional
| If True, convergence information is printed. For the default
| l_bfgs_b solver, disp controls the frequency of the output during
| the iterations. disp < 0 means no output in this case.
| callback : function, optional
| Called after each iteration as callback(xk) where xk is the current
| parameter vector.
| start_ar_lags : int, optional
| Parameter for fitting start_params. When fitting start_params,
| residuals are obtained from an AR fit, then an ARMA(p,q) model is
| fit via OLS using these residuals. If start_ar_lags is None, fit
| an AR process according to best BIC. If start_ar_lags is not None,
| fits an AR process with a lag length equal to start_ar_lags.
| See ARMA._fit_start_params_hr for more information.
| kwargs
| See Notes for keyword arguments that can be passed to fit.
|
| Returns
| -------
| statsmodels.tsa.arima.ARIMAResults class
|
| See also
| --------
| statsmodels.base.model.LikelihoodModel.fit : for more information
| on using the solvers.
| ARIMAResults : results class returned by fit
|
| Notes
| ------
| If fit by 'mle', it is assumed for the Kalman Filter that the initial
| unkown state is zero, and that the inital variance is
| P = dot(inv(identity(m2)-kron(T,T)),dot(R,R.T).ravel('F')).reshape(r,
| r, order = 'F')
|
| predict(self, params, start=None, end=None, exog=None, typ='linear', dynamic=False)
| ARIMA model in-sample and out-of-sample prediction
|
| Parameters
| ----------
| params : array-like
| The fitted parameters of the model.
| start : int, str, or datetime
| Zero-indexed observation number at which to start forecasting, ie.,
| the first forecast is start. Can also be a date string to
| parse or a datetime type.
| end : int, str, or datetime
| Zero-indexed observation number at which to end forecasting, ie.,
| the first forecast is start. Can also be a date string to
| parse or a datetime type. However, if the dates index does not
| have a fixed frequency, end must be an integer index if you
| want out of sample prediction.
| exog : array-like, optional
| If the model is an ARMAX and out-of-sample forecasting is
| requested, exog must be given. Note that you'll need to pass
| k_ar additional lags for any exogenous variables. E.g., if you
| fit an ARMAX(2, q) model and want to predict 5 steps, you need 7
| observations to do this.
| dynamic : bool, optional
| The dynamic keyword affects in-sample prediction. If dynamic
| is False, then the in-sample lagged values are used for
| prediction. If dynamic is True, then in-sample forecasts are
| used in place of lagged dependent variables. The first forecasted
| value is start.
| typ : str {'linear', 'levels'}
|
| - 'linear' : Linear prediction in terms of the differenced
| endogenous variables.
| - 'levels' : Predict the levels of the original endogenous
| variables.
|
|
| Returns
| -------
| predict : array
| The predicted values.
|
|
|
| Notes
| -----
| Use the results predict method instead.
|
| ----------------------------------------------------------------------
| Static methods defined here:
|
| new(cls, endog, order, exog=None, dates=None, freq=None, missing='none')
| Create and return a new object. See help(type) for accurate signature.
|
| ----------------------------------------------------------------------
| Methods inherited from ARMA:
|
| geterrors(self, params)
| Get the errors of the ARMA process.
|
| Parameters
| ----------
| params : array-like
| The fitted ARMA parameters
| order : array-like
| 3 item iterable, with the number of AR, MA, and exogenous
| parameters, including the trend
|
| hessian(self, params)
| Compute the Hessian at params,
|
| Notes
| -----
| This is a numerical approximation.
|
| loglike(self, params, set_sigma2=True)
| Compute the log-likelihood for ARMA(p,q) model
|
| Notes
| -----
| Likelihood used depends on the method set in fit
|
| loglike_css(self, params, set_sigma2=True)
| Conditional Sum of Squares likelihood function.
|
| loglike_kalman(self, params, set_sigma2=True)
| Compute exact loglikelihood for ARMA(p,q) model by the Kalman Filter.
|
| score(self, params)
| Compute the score function at params.
|
| Notes
| -----
| This is a numerical approximation.
|
| ----------------------------------------------------------------------
| Data descriptors inherited from statsmodels.tsa.base.tsa_model.TimeSeriesModel:
|
| exog_names
|
| ----------------------------------------------------------------------
| Methods inherited from statsmodels.base.model.LikelihoodModel:
|
| information(self, params)
| Fisher information matrix of model
|
| Returns -Hessian of loglike evaluated at params.
|
| initialize(self)
| Initialize (possibly re-initialize) a Model instance. For
| instance, the design matrix of a linear model may change
| and some things must be recomputed.
|
| ----------------------------------------------------------------------
| Class methods inherited from statsmodels.base.model.Model:
|
| from_formula(formula, data, subset=None, drop_cols=None, *args, **kwargs) from builtins.type
| Create a Model from a formula and dataframe.
|
| Parameters
| ----------
| formula : str or generic Formula object
| The formula specifying the model
| data : array-like
| The data for the model. See Notes.
| subset : array-like
| An array-like object of booleans, integers, or index values that
| indicate the subset of df to use in the model. Assumes df is a
| pandas.DataFrame
| drop_cols : array-like
| Columns to drop from the design matrix. Cannot be used to
| drop terms involving categoricals.
| args : extra arguments
| These are passed to the model
| kwargs : extra keyword arguments
| These are passed to the model with one exception. The
| eval_env keyword is passed to patsy. It can be either a
| :class:patsy:patsy.EvalEnvironment object or an integer
| indicating the depth of the namespace to use. For example, the
| default eval_env=0 uses the calling namespace. If you wish
| to use a "clean" environment set eval_env=-1.
|
| Returns
| -------
| model : Model instance
|
| Notes
| ------
| data must define getitem with the keys in the formula terms
| args and kwargs are passed on to the model instantiation. E.g.,
| a numpy structured or rec array, a dictionary, or a pandas DataFrame.
|
| ----------------------------------------------------------------------
| Data descriptors inherited from statsmodels.base.model.Model:
|
| dict
| dictionary for instance variables (if defined)
|
| weakref
| list of weak references to the object (if defined)
|
| endog_names
| Names of endogenous variables